Cross-variety speaker transformation in HSMM-based speech synthesis

نویسندگان

  • Markus Toman
  • Michael Pucher
  • Dietmar Schabus
چکیده

We present and compare different approaches for crossvariety speaker transformation in Hidden Semi-Markov Model (HSMM) based speech synthesis that allow for a transformation of an arbitrary speaker’s voice from one variety to another one. The methods developed are applied to three different varieties, namely standard Austrian German, one Middle Bavarian (Upper Austria, Bad Goisern) and one South Bavarian (East Tyrol, Innervillgraten) dialect. For data mapping of HSMMstates we use Kullback-Leibler divergence, transfer probability density functions to the decision tree of the other variety and perform speaker adaptation. We investigate an existing data mapping method and a method that constrains the mappings for common phones and show that both methods can retain speaker similarity and variety similarity. Furthermore we show that in some cases the constrained mapping method gives better results than the standard method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-variety adaptive acoustic modeling in HSMM-based speech synthesis

In this paper we apply adaptive modeling methods in Hidden Semi-Markov Model (HSMM) based speech synthesis to the modeling of three different varieties, namely standard Austrian German, one Middle Bavarian (Upper Austria, Bad Goisern), and one South Bavarian (East Tyrol, Innervillgraten) dialect. We investigate different adaptation methods like dialectadaptive training and dialect clustering th...

متن کامل

Average-Voice-Based Speech Synthesis

This thesis describes a novel speech synthesis framework " Average-Voice-based Speech Synthesis. " By using the speech synthesis framework, synthetic speech of arbitrary target speakers can be obtained robustly and steadily even if speech samples available for the target speaker are very small. This speech synthesis framework consists of speaker normalization algorithm for the parameter cluster...

متن کامل

Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis

This paper describes the use of combined linear regression and expost MAP methods for average-voice-based speech synthesis system based on HMM. To generate more natural sounding speech using the average-voice-based speech synthesis system when a large amount of training data is available, we apply ex-post MAP estimation after the linear transformation based adaptation. We investigate how the am...

متن کامل

Structural Kld for Cross-variety Speaker Adaptation in Hmm-based Speech Synthesis

While the synthesis of natural sounding, neutral style speech can be achieved using today’s technology, fast adaptation of speech synthesis to different contexts and situations still poses a challenge. In the context of variety modeling (dialects, sociolects) we have to cope with the problem that no standardized orthographic form is available and that existing speech resources for these varieti...

متن کامل

Constrained Structural Maximum A for Average-Voice-Based

This paper proposes a constrained structural maximum a posteriori linear regression (CSMAPLR) algorithm for further improvement of speaker adaptation performance in HMM-based speech synthesis. In the algorithm, the concept of structural maximum a posteriori (SMAP) adaptation is applied to estimation of transformation matrices of the constrained MLLR (CMLLR), where recursive MAP-based estimation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013